Search CORE

156 research outputs found

Learning from the past with experiment databases

Author: C. Perlich
D. Brain
H. Blockeel
I.H. Witten
J. Vanschoren
J. Vanschoren
M. Someren Van
R. Holte
Y. Peng
Publication venue: University of Waikato, Department of Computer Science
Publication date: 01/01/2008
Field of study

Thousands of Machine Learning research papers contain experimental comparisons that usually have been conducted with a single focus of interest, and detailed results are usually lost after publication. Once past experiments are collected in experiment databases they allow for additional and possibly much broader investigation. In this paper, we show how to use such a repository to answer various interesting research questions about learning algorithms and to verify a number of recent studies. Alongside performing elaborate comparisons and rankings of algorithms, we also investigate the effects of algorithm parameters and data properties, and study the learning curves and bias-variance profiles of algorithms to gain deeper insights into their behavior

CiteSeerX

Crossref

Research Commons@Waikato

Machine learning for targeted display advertising: Transfer learning in action

Author: Dalessandro B
Perlich C
Provost F
Raeder T
Stitelman O
Publication venue: Stern School of Business, New York University
Publication date: 19/02/2013
Field of study

This paper presents a detailed discussion of problem formulation and data representation issues in the design, deployment, and operation of a massive-scale machine learning system for targeted display advertising. Notably, the machine learning system itself is deployed and has been in continual use for years, for thousands of advertising campaigns (in contrast to simply having the models from the system be deployed). In this application, acquiring sufficient data for training from the ideal sampling distribution is prohibitively expensive. Instead, data are drawn from surrogate domains and learning tasks, and then transferred to the target task. We present the design of this multistage transfer learning system, highlighting the problem formulation aspects. We then present a detailed experimental evaluation, showing that the different transfer stages indeed each add value. We next present production results across a variety of advertising clients from a variety of industries, illustrating the performance of the system in use. We close the paper with a collection of lessons learned from the work over half a decade on this complex, deployed, and broadly used machine learning system.Statistics Working Papers Serie

New York University Faculty Digital Archive

Multidimensional Prediction Models When the Resolution Context Changes

Author: A Bella
A Dhurandhar
C Perlich
G Forman
L Cabibbo
M Golfarelli
M Hall
O Pastor
R Ramakrishnan
R Team
S Chaudhuri
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/09/2015
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-23525-7_31Multidimensional data is systematically analysed at multiple granularities by applying aggregate and disaggregate operators (e.g., by the use of OLAP tools). For instance, in a supermarket we may want to predict sales of tomatoes for next week, but we may also be interested in predicting sales for all vegetables (higher up in the product hierarchy) for next Friday (lower down in the time dimension). While the domain and data are the same, the operating context is different. We explore several approaches for multidimensional data when predictions have to be made at different levels (or contexts) of aggregation. One method relies on the same resolution, another approach aggregates predictions bottom-up, a third approach disaggregates predictions top-down and a final technique corrects predictions using the relation between levels. We show how these strategies behave when the resolution context changes, using several machine learning techniques in four application domains.This work was supported by the Spanish MINECO under grants TIN 2010-21062-C02-02 and TIN 2013-45732-C4-1-P, and the REFRAME project, granted by the European Coordinated Research on Longterm Challenges in Information and Communication Sciences Technologies ERA-Net (CHIST-ERA), and funded by MINECO in Spain (PCIN-2013-037) and by Generalitat Valenciana PROMETEOII2015/013.Martínez Usó, A.; Hernández Orallo, J. (2015). Multidimensional Prediction Models When the Resolution Context Changes. En Machine Learning and Knowledge Discovery in Databases. Springer. 509-524. https://doi.org/10.1007/978-3-319-23525-7_31S509524Agrawal, R., Gupta, A., Sarawagi, S.: Modeling multidimensional databases. In: Proceedings of the Thirteenth International Conference on Data Engineering, ICDE 1997, pp. 232–243. IEEE Computer Society (1997)Bella, A., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.: Quantification via probability estimators. In: IEEE ICDM, pp. 737–742 (2010)Bella, A., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Aggregative quantification for regression. DMKD 28(2), 475–518 (2014)Bickel, R.: Multilevel analysis for applied research: It’s just regression! Guilford Press (2012)Cabibbo, L., Torlone, R.: A logical approach to multidimensional databases. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, p. 183. Springer, Heidelberg (1998)Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. ACM Sigmod Record 26(1), 65–74 (1997)Chen, B.C.: Cube-Space Data Mining. ProQuest (2008)Chen, B.C., Chen, L., Lin, Y., Ramakrishnan, R.: Prediction cubes. In: Proc. of the 31st Intl. Conf. on Very Large Data Bases, pp. 982–993 (2005)Datahub: Car fuel consumptions and emissions 2000–2013 (2013). http://datahub.io/dataset/car-fuel-consumptions-and-emissionsDhurandhar, A.: Using coarse information for real valued prediction. Data Mining and Knowledge Discovery 27(2), 167–192 (2013)Forman, G.: Quantifying counts and costs via classification. Data Min. Knowl. Discov. 17(2), 164–206 (2008)Goldstein, H.: Multilevel Statistical Models, vol. 922. John Wiley & Sons (2011)Golfarelli, M., Maio, D., Rizzi, S.: The dimensional fact model: a conceptual model for data warehouses. Intl. J. of Coop. Information Systems 7, 215–247 (1998)Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explor. 11(1), 10–18 (2009)Hernández-Orallo, J.: Probabilistic reframing for cost-sensitive regression. ACM Transactions on Knowledge Discovery from Data 8(3) (2014)IBM Corporation: Introduction to Aroma and SQL (2006). http://www.ibm.com/developerworks/data/tutorials/dm0607cao/dm0607cao.htmlKamber, M., Jenny, J.H., Chiang, Y., Han, J., Chiang, J.Y.: Metarule-guided mining of multi-dimensional association rules using data cubes. In: KDD, pp. 207–210 (1997)Lin, T., Yao, Y., Zadeh, L.: Data Mining, Rough Sets and Granular Computing. Studies in Fuzziness and Soft Computing. Physica-Verlag HD (2002)Páircéir, R., McClean, S., Scotney, B.: Discovery of multi-level rules and exceptions from a distributed database. In: Proc. of the 6th ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining, pp. 523–532. ACM (2000)Pastor, O., Casamayor, J.C., Celma, M., Mota, L., Pastor, M.A., Levin, A.M.: Conceptual Modeling of Human Genome: Integration Challenges. In: Düsterhöft, A., Klettke, M., Schewe, K.-D. (eds.) Conceptual Modelling and Its Theoretical Foundations. LNCS, vol. 7260, pp. 231–250. Springer, Heidelberg (2012)Perlich, C., Provost, F.: Distribution-based aggregation for relational learning with identifier attributes. Machine Learning 62(1–2), 65–105 (2006)Team, R., et al.: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2012)Ramakrishnan, R., Chen, B.C.: Exploratory mining in cube space. Data Mining and Knowledge Discovery 15(1), 29–54 (2007)Raudenbush, S.W., Bryk, A.S.: Hierarchical linear models: applications and data analysis methods, vol. 1. Sage (2002)UCI Repository: UJIIndoorLoc data set (2014). http://archive.ics.uci.edu/ml/datasets/UJIIndoorLocVassiliadis, P.: Modeling multidimensional databases, cubes and cube operations. In: Proc. of the 10th SSDBM Conference, pp. 53–62 (1998

Crossref

RiuNet

Distribution-based aggregation for relational learning with identifier attributes

Author: A. Bradley
A. McCallum
C. Cortes
C. Perlich
Claudia Perlich
Foster Provost
G. Özsoyoǵlu
H. Blockeel
J. Quinlan
J.-U. Kietz
M. Craven
N. Lavrač
R. DerSimonian
R. Michalski
S. Muggleton
T. Fawcett
T. Gärtner
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Using ILP to Identify Pathway Activation Patterns in Systems Biology

Author: A Subramanian
AL Tarca
C Perlich
D Croft
D Gamberger
JJ Tyson
K Rhrissorrakrai
K Whelan
L Danon
L Dehaspe
L Raedt De
M Holec
MN McCall
MVM França
N Lavrač
N Lavrač
O Kuželka
P Ristoski
PA Flach
R Edgar
R-S Wang
S Draghici
W Kim
W Rongrong
X Robin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

We show a logical aggregation method that, combined with propositionalization methods, can construct novel structured biological features from gene expression data. We do this to gain understanding of pathway mechanisms, for instance, those associated with a particular disease. We illustrate this method on the task of distinguishing between two types of lung cancer; Squamous Cell Carcinoma (SCC) and Adenocarcinoma (AC). We identify pathway activation patterns in pathways previously implicated in the development of cancers. Our method identified a model with comparable predictive performance to the winning algorithm of a recent challenge, while providing biologically relevant explanations that may be useful to a biologist

Crossref

PubMed Central

King's Research Portal

Explore Bristol Research

Logistic Model Trees

Author: C. Nadeau
C. Perlich
E. Frank
Eibe Frank
I. H. Witten
J. Friedman
J. Gama
K. Y. Chan
Mark Hall
Niels Landwehr
R. Ihaka
R. Kohavi
T.-S. Lim
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Machine learning for targeted display advertising: transfer learning in action

Author: B. Dalessandro
B. Zadrozny
C. Perlich
C. Perlich
D. Agarwal
D. Jensen
F. Provost
F. Provost
F. Provost
G. Weiss
H. Zou
J. Attenberg
J. Attenberg
K. Weinberger
L. Bottou
L. Breiman
O. Stitelman
O. Stitelman
P. Ipeirotis
S. Pan
S. Pandey
S. Rosset
T. Evgeniou
T. Fawcett
T. Heskes
T. Raeder
T. Raeder
Y. Chen
Y. Liu
Y. Xue
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Fast relational learning using bottom clause propositionalization with artificial neural networks

Relational learning can be described as the task of learning first-order logic rules from examples. It has enabled a number of new machine learning applications, e.g. graph mining and link analysis. Inductive Logic Programming (ILP) performs relational learning either directly by manipulating first-order rules or through propositionalization, which translates the relational task into an attribute-value learning task by representing subsets of relations as features. In this paper, we introduce a fast method and system for relational learning based on a novel propositionalization called Bottom Clause Propositionalization (BCP). Bottom clauses are boundaries in the hypothesis search space used by ILP systems Progol and Aleph. Bottom clauses carry semantic meaning and can be mapped directly onto numerical vectors, simplifying the feature extraction process. We have integrated BCP with a well-known neural-symbolic system, C-IL2P, to perform learning from numerical vectors. C-IL2P uses background knowledge in the form of propositional logic programs to build a neural network. The integrated system, which we call CILP++, handles first-order logic knowledge and is available for download from Sourceforge. We have evaluated CILP++ on seven ILP datasets, comparing results with Aleph and a well-known propositionalization method, RSD. The results show that CILP++ can achieve accuracy comparable to Aleph, while being generally faster, BCP achieved statistically significant improvement in accuracy in comparison with RSD when running with a neural network, but BCP and RSD perform similarly when running with C4.5. We have also extended CILP++ to include a statistical feature selection method, mRMR, with preliminary results indicating that a reduction of more than 90 % of features can be achieved with a small loss of accuracy

CiteSeerX

City Research Online

Crossref

Theopolis Monk: Envisioning a Future of A.I. Public Service

Author: A Plantinga
Boris Delibasic
C. Perlich
Colin G. Walsh
D George
DC Schuurman
E Morozov
F Pasquale
H A Haenssle
J Pearl
J Pearl
J Weizenbaum
J Weizenbaum
JF Nash
Joanna J. Bryson
L Maaten van der
N Bostrom
PB Wigley
R Kurzweil
RH Wortham
Ryan Porter
S Dieleman
S Vallor
Sanford Kessler
Seung Seog Han
Stephen F. Weng
Publication venue
Publication date: 01/01/2019
Field of study

Visions of future applications of artificial intelligence tend to veer toward the naively optimistic or frighteningly dystopian, neglecting the numerous human factors necessarily involved in the design, deployment and oversight of such systems. The dream that AI systems may somehow replace the irregularities and struggles of human governance with unbiased efficiency is seen to be non-scientific and akin to a religious hope, whereas the current trajectory of AI development indicates that it will increasingly serve as a tool by which humans exercise control over other humans. To facilitate the responsible development of AI systems for the public good, we discuss current conversations on the topics of transparency and accountability

PhilPapers

Crossref

On the assessment by grazing-incidence small-angle X-ray scattering of replica quality in polymer gratings fabricated by nanoimprint lithography

Author: Baumbach
Cecchini
Chandross
Chen
D. E. Martínez-Tong
D. R. Rueda
Dirckx
E. Rebollar
F. Pérez-Murano
Hernández
Hlaing
Hu
Hu
I. Martín-Fabiani
Lee
Lochbihler
M. C. García-Gutiérrez
M. Soccio
Mayer
Meier
Meier
Metzger
Mikulík
Mills
N. Alayo
Perlich
Roth
Rowland
Rueda
Schift
T. A. Ezquerra
Timmann
Wernecke
Yan
Yoneda
Publication venue: 'International Union of Crystallography (IUCr)'
Publication date
Field of study

Crossref